智能论文笔记

Variational Inference with Gaussian Mixture by Entropy Approximation

Takashi Furuya , Hiroyuki Kusumoto , Koichi Taniguchi , Naoya Kanno , Kazuma Suetake

分类： (统计)机器学习 | 机器学习

2022-02-26

变异推断是一种近似顽固性后验分布的技术，以量化机器学习的不确定性。尽管通常选择单峰高斯分布作为参数分布，但几乎不能近似多模式。在本文中，我们将高斯混合物分布作为参数分布。高斯混合物的变异推断的主要难度是如何近似高斯混合物的熵。我们将高斯混合物的熵近似为单峰高斯的熵之和，可以在分析上计算。此外，我们理论上分析了真熵与近似熵之间的近似误差，以揭示我们的近似何时效果很好。具体而言，近似误差由平均值与高斯混合物方差之和之间的距离之比控制。此外，当比率变为无穷大时，它会收敛到零。由于维度的诅咒，这种情况似乎更有可能在更高维度的参数空间中发生。因此，我们的结果保证了我们的近似效果很好，例如，在具有大量权重的神经网络中。

translated by 谷歌翻译

Achieving Transparency in Distributed Machine Learning with Explainable Data Collaboration

Anna Bogdanova , Akira Imakura , Tetsuya Sakurai , Tomoya Fujii , Teppei Sakamoto , Hiroyuki Abe

分类：机器学习 | 人工智能

2022-12-06

Transparency of Machine Learning models used for decision support in various industries becomes essential for ensuring their ethical use. To that end, feature attribution methods such as SHAP (SHapley Additive exPlanations) are widely used to explain the predictions of black-box machine learning models to customers and developers. However, a parallel trend has been to train machine learning models in collaboration with other data holders without accessing their data. Such models, trained over horizontally or vertically partitioned data, present a challenge for explainable AI because the explaining party may have a biased view of background data or a partial view of the feature space. As a result, explanations obtained from different participants of distributed machine learning might not be consistent with one another, undermining trust in the product. This paper presents an Explainable Data Collaboration Framework based on a model-agnostic additive feature attribution algorithm (KernelSHAP) and Data Collaboration method of privacy-preserving distributed machine learning. In particular, we present three algorithms for different scenarios of explainability in Data Collaboration and verify their consistency with experiments on open-access datasets. Our results demonstrated a significant (by at least a factor of 1.75) decrease in feature attribution discrepancies among the users of distributed machine learning.

translated by 谷歌翻译

Hybrid Life: Integrating Biological, Artificial, and Cognitive Systems

Manuel Baltieri , Hiroyuki Iizuka , Olaf Witkowski , Lana Sinapayen , Keisuke Suzuki

分类：人工智能

2022-12-01

Artificial life is a research field studying what processes and properties define life, based on a multidisciplinary approach spanning the physical, natural and computational sciences. Artificial life aims to foster a comprehensive study of life beyond "life as we know it" and towards "life as it could be", with theoretical, synthetic and empirical models of the fundamental properties of living systems. While still a relatively young field, artificial life has flourished as an environment for researchers with different backgrounds, welcoming ideas and contributions from a wide range of subjects. Hybrid Life is an attempt to bring attention to some of the most recent developments within the artificial life community, rooted in more traditional artificial life studies but looking at new challenges emerging from interactions with other fields. In particular, Hybrid Life focuses on three complementary themes: 1) theories of systems and agents, 2) hybrid augmentation, with augmented architectures combining living and artificial systems, and 3) hybrid interactions among artificial and biological systems. After discussing some of the major sources of inspiration for these themes, we will focus on an overview of the works that appeared in Hybrid Life special sessions, hosted by the annual Artificial Life Conference between 2018 and 2022.

translated by 谷歌翻译

SLOPT: Bandit Optimization Framework for Mutation-Based Fuzzing

Yuki Koike , Hiroyuki Katsura , Hiromu Yakura , Yuma Kurogome

分类：机器学习

2022-11-07

Mutation-based fuzzing has become one of the most common vulnerability discovery solutions over the last decade. Fuzzing can be optimized when targeting specific programs, and given that, some studies have employed online optimization methods to do it automatically, i.e., tuning fuzzers for any given program in a program-agnostic manner. However, previous studies have neither fully explored mutation schemes suitable for online optimization methods, nor online optimization methods suitable for mutation schemes. In this study, we propose an optimization framework called SLOPT that encompasses both a bandit-friendly mutation scheme and mutation-scheme-friendly bandit algorithms. The advantage of SLOPT is that it can generally be incorporated into existing fuzzers, such as AFL and Honggfuzz. As a proof of concept, we implemented SLOPT-AFL++ by integrating SLOPT into AFL++ and showed that the program-agnostic optimization delivered by SLOPT enabled SLOPT-AFL++ to achieve higher code coverage than AFL++ in all of ten real-world FuzzBench programs. Moreover, we ran SLOPT-AFL++ against several real-world programs from OSS-Fuzz and successfully identified three previously unknown vulnerabilities, even though these programs have been fuzzed by AFL++ for a considerable number of CPU days on OSS-Fuzz.

translated by 谷歌翻译

Non-readily identifiable data collaboration analysis for multiple datasets including personal information

Akira Imakura , Tetsuya Sakurai , Yukihiko Okada , Tomoya Fujii , Teppei Sakamoto , Hiroyuki Abe

分类：机器学习

2022-08-31

多源数据融合，共同分析了多个数据源以获得改进的信息，引起了广泛的研究关注。对于多个医疗机构的数据集，数据机密性和跨机构沟通至关重要。在这种情况下，数据协作（DC）分析通过共享维数减少的中间表示，而无需迭代跨机构通信可能是合适的。在分析包括个人信息在内的数据时，共享数据的可识别性至关重要。在这项研究中，研究了DC分析的可识别性。结果表明，共享的中间表示很容易识别为原始数据以进行监督学习。然后，这项研究提出了一个非可读性可识别的直流分析，仅共享多个医疗数据集（包括个人信息）的非可读数据。所提出的方法基于随机样本排列，可解释的直流分析的概念以及无法重建的功能的使用来解决可识别性问题。在医学数据集的数值实验中，提出的方法表现出非可读性可识别性，同时保持了常规DC分析的高识别性能。对于医院的数据集，提出的方法在仅使用本地数据集的本地分析的识别性能方面表现出了9个百分点的改善。

translated by 谷歌翻译

HTML版本

NRBdMF: A recommendation algorithm for predicting drug effects considering directionality

Iori Azuma , Tadahaya Mizuno , Hiroyuki Kusuhara

分类：机器学习

2022-08-05

根据有关批准药物的信息预测药物的新作用可以被视为推荐系统。矩阵分解是最常用的推荐系统之一，为其设计了各种算法。用于预测药物效应的现有算法的文献调查和摘要表明，大多数此类方法，包括邻里正规逻辑矩阵分解，这是基准测试中最佳性能的最佳性能，它使用了仅考虑存在或不存在相互作用的二进制矩阵。但是，已知药物作用具有两个相反的方面，例如副作用和治疗作用。在本研究中，我们建议使用邻域正规化双向基质分解（NRBDMF）通过纳入双向性来预测药物作用，这是药物效应的特征。我们使用这种建议的方法使用矩阵来预测副作用，该基质考虑了药物效应的双向，其中已知的副作用被分配为阳性标签（加1），并为已知的治疗效应分配了阴性（负1）标签。使用药物双向信息的NRBDMF模型在预测列表的底部达到了副作用的富集和指示。第一次尝试使用NRBDMF来考虑药物效应的双向性质的尝试表明，它降低了假阳性并产生了高度可解释的输出。

translated by 谷歌翻译

A Lightweight Transmission Parameter Selection Scheme Using Reinforcement Learning for LoRaWAN

Aohan Li , Ikumi Urabe , Minoru Fujisawa , So Hasegawa , Hiroyuki Yasuda , Song-Ju Kim , Mikio Hasegawa

分类：机器学习

2022-08-03

预计到2023年，物联网设备的数量将达到1,250亿。物联网设备的增长将加剧设备之间的碰撞，从而降低通信性能。选择适当的传输参数，例如通道和扩展因子（SF），可以有效地减少远程（LORA）设备之间的碰撞。但是，当前文献中提出的大多数方案在具有有限的计算复杂性和内存的物联网设备上都不容易实现。为了解决此问题，我们提出了一种轻巧的传输参数选择方案，即使用用于低功率大区域网络（Lorawan）的增强学习的联合通道和SF选择方案。在拟议的方案中，可以仅使用确认（ACK）信息来选择适当的传输参数。此外，我们从理论上分析了我们提出的方案的计算复杂性和记忆要求，该方案验证了我们所提出的方案可以选择具有极低计算复杂性和内存要求的传输参数。此外，在现实世界中的洛拉设备上实施了大量实验，以评估我们提出的计划的有效性。实验结果证明了以下主要现象。（1）与其他轻型传输参数选择方案相比，我们在Lorawan中提出的方案可以有效避免Lora设备之间的碰撞，而与可用通道的变化无关。（2）可以通过选择访问通道和使用SFS而不是仅选择访问渠道来提高帧成功率（FSR）。（3）由于相邻通道之间存在干扰，因此可以通过增加相邻可用通道的间隔来改善FSR和公平性。

translated by 谷歌翻译

Wasserstein Graph Distance based on $L_1$-Approximated Tree Edit Distance between Weisfeiler-Lehman Subtrees

Zhongxi Fang , Jianming Huang , Xun Su , Hiroyuki Kasai

分类：机器学习 | 人工智能

2022-07-09

Weisfeiler-Lehman（WL）测试已广泛应用于图内核，指标和神经网络。但是，它仅考虑图的一致性，从而导致结构信息的描述能力较弱。因此，它限制了应用方法的性能提高。另外，WL检验定义的图之间的相似性和距离是粗略的测量。据我们所知，本文首次阐明了这些事实，并定义了我们称为Wasserstein WL子树（WWLS）距离的指标。我们将WL子树引入节点附近的结构信息，并将其分配给每个节点。然后，我们定义一个基于$ l_1 $ - 应用的树编辑距离（$ l_1 $ - ted）的新图嵌入空间：$ l_1 $ norm of noce noce node node nord noce node fartial farture varter vectors in space上的差异为$ l_1 $ - 节点。我们进一步提出了一种用于图嵌入的快速算法。最后，我们使用Wasserstein距离来反映$ L_1 $的图形级别。 WWL可以捕获传统指标困难的结构的小变化。我们在几个图形分类和度量验证实验中演示了其性能。

translated by 谷歌翻译

Towards Unifying Perceptual Reasoning and Logical Reasoning

Hiroyuki Kido

分类：人工智能

2022-06-27

越来越多的科学实验支持视觉作为贝叶斯推论的观点，贝叶斯推论植根于赫尔姆霍尔茨认为是无意识的推论的看法。最近对逻辑的研究表明了逻辑推理为贝叶斯推断的观点。在本文中，我们提供了一个简单的概率模型，该模型适用于感知推理和逻辑推理。我们表明该模型统一了感知和逻辑系统中常见的两个基本过程：一方面，感知和逻辑知识源自另一个知识的过程，另一方面，另一方面，此类知识得出的过程从数据。我们从逻辑后果关系方面充分表征了模型。

translated by 谷歌翻译

Aggregated Multi-output Gaussian Processes with Knowledge Transfer Across Domains

Yusuke Tanaka , Toshiyuki Tanaka , Tomoharu Iwata , Takeshi Kurashima , Maya Okawa , Yasunori Akagi , Hiroyuki Toda

分类： (统计)机器学习 | 机器学习

2022-06-24

汇总数据通常出现在社会经济和公共安全等各个领域。汇总数据与点不关联，而与支持（例如，城市中的空间区域）相关联。由于支撑物可能取决于属性（例如贫困率和犯罪率），因此对此类数据进行建模并不直接。本文提供了一个多输出高斯流程（MOGP）模型，该模型使用各自粒度的多个聚合数据集侵入属性的功能。在提出的模型中，每个属性的函数被认为是建模为独立潜在GPS的线性混合的依赖GP。我们设计一个具有每个属性聚合过程的观察模型；该过程是GP在相应支持上的组成部分。我们还引入了混合权重的先验分布，该分布可以通过共享先验来跨域（例如城市）进行知识转移。在这种情况下，这是有利的，因为城市中的空间汇总数据集太粗糙而无法插值。提出的模型仍然可以通过利用其他城市中的聚合数据集来准确地预测属性。提出的模型的推断是基于变异贝叶的，它使人们能够使用来自多个域的聚合数据集学习模型参数。该实验表明，所提出的模型在改善现实世界数据集上的粗粒骨料数据的任务中胜过：北京的空气污染物的时间序列以及来自纽约市和芝加哥的各种空间数据集。

translated by 谷歌翻译